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MODERN  EMPIRICAL  STATISTICAL  SPECTRAL  ANALYSIS 


Emanuel  Parzen 

Institute  of  Statistics 
Texas  A  &  M  University 


INTRODUCTION* 


This  paper  has  two  aims:  (1)  to  provide  perspectives  on  the 
diverse  paths  of  analysis  which  are  available  in  1980  to  estimate 
the  spectrum  of  an  observed  time  series;  and  (2)  to  describe  pro¬ 
posals  for  optimal  statistical  spectral  estimation  procedures 
which  combine  autoregressive  spectral  estimators  and  log  spectral 
estimators.  It  is  proposed  that  empirical  statistical  spectral 
analysis  should  be  an  adaptive  procedure  for  forming  an  iterative 
spectral  estimator  (an  iterative  estimator  is  one  composed  of 
estimators  obtained  in  different  steps  of  the  analysis).  There 
are  three  parts:  I.  Basic  concepts  of  time  series  spectral 

analysis;  II.  Entropy  distances,  autoregressive  spectral  esti¬ 
mators  and  log  spectral  estimators;  III.  An  outline  of  empirical 
spectral  analysis  . 


I.  BASIC  CONCEPTS  OF  TIME  SERIES  SPECTRAL  ANALYSIS 

By  spectral  analysis  of  a  time  scries  Y(t)  one  means  fitting 
to  the  time  series  a  spectral  representation  of  the  form 

Y(t)  =  Je2l'1Xt  d¥(A)  . 


*This  research  was  supported  by  the  Office  of  Naval  Research 
(Contract  N00014-78-C-0599)  . 
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Spectral  analysis  has  as  its  aim  the  determination  of  the  proper¬ 
ties  of  the  function  H'(A)  .  Model  identification  is  concerned 
with  determining  the  qualitative  properties  of  Y(A)  ,  and  para¬ 
meter  estimation  is  concerned  with  determining  the  quantitative 
properties  of  H'(A)  .  This  chapter  defines  some  basic  concepts 
of  spectral  representations  of  time  series. 


1.  DISCRETE  PARAMETER  AND  CONTINUOUS  PARAMETER  TIME  SERIES 

•  The  theory  of  time  series  analysis  discusses  separately  dis¬ 
crete  parameter  time  series  {Y(t),  t=0,  +1,  ...}  and  continu¬ 
ous  parameter  time  series  {  Y(t),  -°>  <  t  <  «}  .  This  paper  dis¬ 

cusses  only  discrete  parameter  time  series.  The  range  of  the 
frequency  variable  A  is  taken  to  be  -0.5  to  0.5  in  the  discrete 
parameter  case,  and  -°°  to  in  the  continuous  parameter  case.  In 
many  scientific  fields,  a  discrete  parameter  series  Y(n)  arises 
by  observing  a  continuous  parameter  time  series  Z(t)  at  equi- 
spaced  times  t  =  nD  ,  so  that  Y(n)  =  Z(nD)  .  One  calls  D  the 
sampling  interval.  We  assume  a  spectral  representation 


Z(t)  =  J  e2nlttl)  dw  . 

—  00  " 

Then 

Y(n)  *  Z(nD)  =  /  e2^inDiD  du> 

—  00 

Let  x  =  D<*>.  Then 

Y(n)  =  /  e2,rinX  i  ^z(^)  dA  . 

—00 


Write  the  integral  from  -<®  to  ™>  as  the  sum  of  integrals  over  the 
intervals  k  -  0.5,  k  +  0.5  for  k  =  0,  +  1,  ...;  the  latter  inte¬ 
gral 


k+0.5 

I 


k-0.5 


2winX 

e 


D  ,J’z(5)  dx 


0.5 


-0.5 


2irinA  1 


1  ,  'X'+k' 

D  ^Z^  D 


dA' 


Samp ■ Ing  Theorem:  Y(n)  has  the  spectral  representation 

Y(n)  =  /  e2"inA  }  dA  ^ 

-0.5 
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*y(A) 


oo 


k=-a 


.  ,A+k 

*Z(~D“ 


)  . 


Further,  if  Z(t)  is  bandlimited,  in  the  sense  that  i|<7(u>)  =  0  for 
1  “ 

M  >2D  *  then 

^y(A)  =  ^  *  0z(u)  =  D  i|»y(Do))  . 

Using  these  formulas  one  can  rewrite  the  formulas  one  obtains 
for  the  spectrum  of  Y(t)  ,  t  =  0,  +  1,  ...  as  formulas  involving 
the  spectrum  of  the  sampled  time  series  Z(t),  -®  <  t  <  ®  . 

2.  SOME  TIME  SERIES  MODEL  TYPES 

Observed  discrete  parameter  time  series  often  may  be  regarded 
as  sums  of  different  types  of  functions. 

Pure  harmonics  of  period  p  _>  2  are  functions 

Y(t)  =  A  cos  — t  +  B  sin  —  t  ; 

P  P 

for  which  V(\)  is  a  function  of  bounded  variation  which  changes 
only  in  jumps;  f(A+0)  -  T(A-O)  =  0  for  all  A  in  0  <  \  <  0.5 
except  A  =  JL 

P 

2 

Transients  are  square  summable  functions,  £  Y  (t)  <  ® 

t=-“ 

Then  'f(A)  has  a  derivative  i{/(A)  =  4,,(A)  satisfying 

4>(A)  =  l  Y (t)  e"2"itX  , 
t=“°° 

'  0.5  9 

Y(t)  =  /  eZ1T1CA  ¥(A)  dA  . 

-0.5 


An  important  example  of  a  transient  time  series  Y(t)  is  a  spike 
which  is  non-zero  at  only  one  time  t^;  then 


t|>(A)  =  Y(tg)  e 


-2iritQA 


2 

and  |  ip ( A )  |  =  constant  for  all  A  . 
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The  non-deterministic  component  of  a  time  series  is  often 
assumed  to  be  a  covariance  stationary  time  series  Y(t)  with  zero 
mean  and  covariance  function  the  notation  of  Parzen  (1962)) 

R(v)  =  E[Y(t)  Y (t-h/)  ]  ,  v  =  0,  +  1 . 

The  correlation  function  is 

p(v)  =  ~^y  =  Correlation  [Y(t)  ,  Y(t+v)]  . 

We  divide  stationary  time  series  into  three  types,  (1)  white 
noise,  (2)  short  memory,  or  (3)  long  memory,  whose  definitions 
are  given  in  the  next  section. 

A  pure  harmonic  of  period  p  obeys  the  difference  equation 


Y(t)  -  <J>Y(t-l)  +  Y(t-2)  =  0 


where  4>  =  2  cos 


Consequently  if  a  time  series  Y(t)  is  the 


sum  of  harmonics  and  a  stationary  time  series  a  useful  way  to 
identify  a  model  for  the  time  series  Y(t)  is  to  introduce  a 
transformed  time  scries 


Y(t)  =  Y ( t)  -  <j>Y ( t-1)  +  Y(t-2) 

and  to  model  Y(t)  as  a  stationary  time  series.  The  final  model 
fitted  to  the  time  series  Y(t)  is  called  an  iterated  model 
when  it  has  the  form 


Y(t)  -j  [-> 


c(t)  white  noise  . 


To  estimate  the  spectrum  of  a  time  series,  one  must  identify 
the  qualitative  model  types  of  which  the  time  series  is  composed 
before  one  can  estimate  quantitatively  their  properties.  It  may 
be  wisest  to  carry  out  iji  parallel  several  of  the  approaches  to 
time  series  computations  described  in  Chapter  III.  I  express 
this  point  of  view  in  a  motto:  "If  one  can  think,  of  two  or  more 
ways  of  solving  the  problem,  one  should  solve  it  in  two  or  more 
ways." 


3.  STATIONARY  TIME  SERIES  MODEL  TYPES 

A  stationary  time  scries  Y(t)  has  a  spectral  representation 
in  terms  of  a  stochastic  integrand  f(A)  satisfying 


m 
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K|df(X)|2  =  R(0)  f(A)  dA  =  R(0)  dF(A)  . 

where  f (X )  and  F( A)  are  spectral  density  and  spectral  distribution 
functions,  respectively,  whose  definitions  are  given  in  this 
section. 

White  noise,  or  a  no  memory  time  series,  is  a  time  series 
of  independent  random  variables;  it  satisfies  £  |R(v)j  =  0  . 

v>0 


To  introduce  the  notion  of  a  time  series  of  short  memory 
type,  we  consider  a  stationary  time  series  Y(t)  and  assume  that 

co 

|  R ( v)  |  <  «  .  We  define  the  power  spectrum  of  the  time  series 

yrs-.cn 

to  be 

ao 

S(A)  =  l  e“27TiXv  R(v)  ,  -0.5  <  A  £  0.5  ; 

y=s— oo 


it  satisfies 


R(v)  =  /  e2"ivA  s(x)  dA  ^  v  =  0>  +  ... 

-0.5 

We  define  the  spectral  density  of  the  time  series  by 


f(A)  =  l  e~2llivX  p(v)  ,  -0.5  <  A  <  0.5  ; 

V=— oo 

It  provides  a  spectral  representation  of  the  correlation  function. 


p(v)  =  /  e2TTivA  ,  v  =  0,  +  1,  ... 

-0.5 


To  define  the  spectrum  of  a  stationary  time  series  whose 
correlation  function  p(v)  is  not  summable  define,  for  any  T  >  0, 


l 

j,k=l 


e-2TTiAj 


2TTiAk 

e 


p(j-k)  ; 


it  is  a  non-negative  function  by  the  non-negative  definite  pro¬ 
perty  of  p(v)  .  One  can  write 
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fT<»>  •  l  e'2’1Xv  (l-^nCv)  , 

!v]<T 

(l-^-)p(v)  -  /°'5  e2’i>v  f«>  dX  . 

-0.5 

When  p  (v)  is  summable,  f  ( X )  — »  f(X)  0  .  Otherwise,  o(v)  is 
the  limit  (as  T  — *“)  of  Fourier  transforms  of  non-negat i ve  func¬ 
tions,  and  therefore  there  exists  a  spectral  distribution  function 
F(X)  ,  -0.5  _<  X  <_  0.5  such  that 

0.5  Ott  i  } 

P(v)  =  /  e2niAv  dF(A)  , 

-0.5 

X 

and  F  (A)  =  /  f  (u)  du  F(A)  . 

-0.5  1 

An  important  diagnostic  tool  of  the  type  of  a  stationary 
time  series  is  its  spectral  log  range ,  defined  by 

SPLR  =  lim  log  max  f  (X)  -  log  min  f  (X)  . 

T-*  »  X  1  X  1 

The  memory  type  of  a  stationary  time  series  is  classified 
according  to  the  behavior  of  its  spectral  log  range: 


NO  MEMORY 

SHORT  MEMORY 

LONG  MEMORY 

SPLR  =  0 

0  <  SPLR  <  •» 

SPLR  =  oo 

A  stationary  time  series  has  short  memory  if  )  |p(v)|  <  <® 

\J  —  —  co 

and  the  spectral  density  f(X)  4  0  for  any  X  ;  then  there  exist 
positive  constants  Cj  and  such  that  0  <  £  f(X)  £  <  <*> 

for  all  X.  For  a  short  memory  series,  f(X),  f  A(X)  ,  and  log 
f(X)  arc  nil  Integral) Ip  over  the  interval  -0.5  <  X  <  0.5  . 


4.  STATIONARY  FILTER  THEOREM 

The  interpretation  of  the  power  spectrum  comes  from  the 
following  important  theorem. 

Filter  Theorem.  If  Y(’)  is  stationary  with  spectral  density 


fy(A)  ,  and 


/•( O  =  ).  b(t-s)  Y (s)  =  l  b(s)  Y(t-s) 


where 


l  b2(s)  <  «  ,  B(A)  =  X  b(s) 


tlu'ii  /.(•)  Is  stationary  with  spectral  density  and  covariance 
function  given  by 

,  Ry(0) 

fz(A)  =  fy(A)  |B(A)|2^  , 


Rz(v)  =  l  R^s)  (v+s) 
s=-°° 


defining  R^(v)  =  £  b(k)  b(k+v) 

k=-co 


5.  WHITENING  FILTERS 

Another  major  aim  of  time  series  analysis  is  to  obtain 
whitening  filter  representations  of  Y(t)  ,  t  =  0,  +  1,  ... 
of  the  form 


P  q 

l  a  Y(t-j)  =  l  b  (t-k) 


where  (n(t)  ,  t  *  0,  +  1,  ...}  is  a  time  series  of  "simple" 
structure;  in  particular  n(t)  might  be  white  noise  or  a  series 
of  impulses.  Whitening  filter  analysis  has  its  aim  the  deter¬ 
mination  of  the  parameters  p,q,  a^,  a^,  ...»  a  ,  b^,  b^,...,b  ,  and 
series  n(t)  ,  especially  its  spectral  representation  ^ 


i(t)  =  / 


e27T±tA  d¥n(A)  . 


The  whitening  filter  is  called:  an  autoregressive ,  or 
AR,  filter  if  q  =  0  ;  a  moving  average ,  or  MA,  filter  if  p  =  0  ; 
and  an  autoregressive  moving  average,  or  ARMA,  filter  if  p  and  q 


P 
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are  both  non-zero.  The  most  frequently  used  filters  are  AR 
filters . 

From  a  whitening  filter  representation  of  Y(t)  one  may  infer 
properties  of  its  spectral  representation;  define 

p  q 

.  2tt iX .  v  —  2ti i j  A  ,  .  2niA.  <-  ,  — 2n ikA 

g  (e  )  =  ).  a  e  ,  h  (e  )  =  ).  b  e 

p  j=0  3  q  k=0 

called  respectively  the  AR  and  MA  transfer  function.  Then 

2iritA  .  2ttiA.  w  ...  2tt i t A  .  2iriA.  ... 

j  e  g  (e  )dT  (A)  =  J  e  h  (e  )dT  (A)  . 

-0.5  P  Y  -0.5  q  n 

Consequently  (for  all  A^) 


J  0  g  (e27riA)dT  (A)  =  /  °  h  (e2l’iA)dT  (A)  . 


-0.5 


-0.5 


Knowing  g^,  h^ ,  and  ,  one  can  solve  for  . 

When  n(")  is  a  stationary  time  series  we  define  the  spectral 
density  of  Y(-)  by  the  filter  theorem: 


fy(A)  =  f  n  ( A )  — 3- 


h _  (e27TiA)  |  2  _2 

2tt1A,  ,2  °n 


*P(e  > 


where  O2  is  a  "measure"  of  R  (0 )  / Ry.  (0 )  ,  such  as 


-2  En  <t) 

a  =  - x— — 

n  EY2(t) 


The  whitening  filter  is  written  symbolically  in  terms  of 
the  lag  operator  L  defined  by  LY(t)  =  Y(t-l)  .  Then 


g  (L)Y(t)  =  h  (L)  n(t)  ,  0  ( t )  =  Y  ( t ) 


i  ( 

q 


6.  BASIC  SAMPLE  STATISTICS 

To  form  estimators  of  parameters,  such  as  R(v)  and  f(A), 
we  can  either  seek,  estimators  which  are  optimal  according  to  an 
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estimation  criterion  such  as  maximum  likelihood  or  we  can  form 
estimators  which  seem  "natural"  and  determine  their  asymptotic 
optimality  properties.  A  natural  estimator  of  R(v)  =  E[Y(t)Y (t+v) ] 
from  a  sample  { Y ( t ) ,  t  =  1,  T}  is 

1 

R(v)  =  -  l  Y ( t )  Y(t+v) 
t=l 


called  the  sample  covariance  function.  Note  that  we  divide  by 
T  rather  than  by  T-v  in  order  to  obtain  a  function  R(v)  wtiich 
is  positive-definite 

n 

l  c.c.R(j-k)  _>  0  for  all  n,  c ,  ...,  c 
j , k=l  J  k  in 

Then  p (v)  is  estimated  by 
T-v 

y  Y(t)  Y (  t+v) 

t=l _ 

y  Y2(t> 

t=i 


P(v)  = 


R(v). 

R(0) 


called  the  sample  correlation  function.  These  functions  possess 
spectral  representations 


R(v)  =  /  g2TTiXv  g(A)  dA  ^ 
-0.5 

S(v)  =  /  e2'rlAv  f (A)  dA  , 

-0.5 


in  terms  of 


S(A)  =  i 


l  Y(t)  e 

t=l 


■2nit A 


f  ( A )  = 


l  Y( t)  . 

t  =  l _ 

T  \ 

)  Y2(t) 

t  =  l 


-  2tt  i  t  A 


It  should  be  noted  that  these  functions  provide  a  generalized 
harmonic  analysis  of  Y(-)  in  the  sense  of  Wiener  (1930). 

We  call  S ( A )  the  sample  £ower  spectrum  and  f(')  the  sample 
spectral  density.  They  are  natural  estimators  of  S(>)  and  f(>) 
respectively,  but  they  are  very  wiggly  functious  and  lack  most 
of  the  properties  of  optimal  estimators.  Thus  arises  the  need 
for  a  sophisticated  theory  of  statistical  spectral  analysis. 

a  ^ 

One  reason  for  using  p(v)  and  f  (X )  as  basic  diagnostic 
statistics  for  observed  time  series  is  that  they  possess  fast 
computation  algorithms,  using  the  Fast  Fourier  transform.  Given 
a  sample  (Y(t),  t  =  1,  ...»  T)  one  proceeds  as  follows. 

A.  Pre-processing.  To  analyze  a  time  series  sample  Y(t),  t  =  1, 
...,  T  ,  one  will  proceed  in  stages  which  often  involve  the  sub¬ 
traction  of  or  elimination  of  strong  effects  in  order  to  see  more 
clearly  weaker  patterns  in  the  time  series  structure. 

The  aim  of  pre-processing  is  to  transform  Y(-)  to  a  new 
time  series  Y(-)  which  is  short  memory  (a  zero  mean  stationary 
time  series  whose  spectral  density  has  finite  log  range).  The 
basic  pre-processing  operations  are  memory  less  transformation 
(such  as  square  root  and  logarithm),  detrending,  "high  pass" 
filtering,  and  differencing.  One  usually  subtracts  out  the  sample 

-  1  J 

mean  Y  =  —  }  Y(t)  ;  then  the  time  series  actually  processed 

T  t=l 

is  Y(t)  -  Y  .  If  the  mean  Y  is  a  large  number,  it  should  be  sub¬ 
tracted;  the  variations  in  Y(t)  are  then  the  variations  of  Y(t) 
about  its  mean.  The  sample  mean  Y  and  sample  variance  R(0)  should 
always  be  recorded. 

B .  Sample  Fourier  Transform  by  Data  Windowing,  Extending  with 
Zeroes,  and  Fast  Fourier  Transform.  The  first  step  in  a  compre¬ 
hensive  analysis  of  a  pre-processed  time  series  sample  should 
always  be  the  computation  of  the  sample  Fourier  transform 

T 

ij)(X)  =  J  Y(t)  exp  (—  2tt i A t ) 
t  =  l 

at  an  equi-spaced  grid  of  frequencies  in  0  <  X  <  1  ,  of  the  form 
X  =  —  ,  k  =  0,  ...,  Q-l  .  We  call  Q  the  spectral  computation 
number.  One  should  always  choose  Q  T  ,  and  we  recommend  Q  2T . 

Prior  to  computing  <ji(X)  ,  one  should  extend  the  length  of 
the  time  series  by  adding  zeroes  to  it.  Then  i)j(X)  ,  X  =  —  , 


can  be  computed  using  the  Fast  Fourier  transform. 

In  addition,  one  should  compute  a  sample  "data  windowed" 
Fourier  transform 


tK,(X)  =  l  Y(t)W(^r)  exp(-2TiiXt)  . 
W  t=l  1 


To  understand  the  effect  of  the  window,  one  replaces  Y(t)  by  its 
spectral  representation 


0.5 

Y(t)  =  /  exp  (2TTiX't)  dVfX* )  ; 


-0.5 


0.5 

then  (X)  =  /  w  (X-X  ’ )  d'P(X’) 

W  -0.5 


where  w  (X)  =  W(^)  exp(-2iriXt)  . 

t=l 


Considerations  involved  in  the  choice  of  data  windows  are  dis¬ 
cussed  in  Harris  (1978)  . 

C.  Sample  Spectral  Density.  The  sample  spectral  density  f ( X ) 
is  obtained  essentially  by  squaring  and  normalizing  the  sample 
Fourier  transform; 


f(X) 


lMA)l2 

i  Q"1  ~  k 

Uo1^1 


2 


1. 


Q-l 


It  is  a  function  with  period  1,  whose  domain  is  taken  to  be 
-0.5  <  X  <  0.5  (or  0  <  X  <  1),  which  integrates  to  1  and  provides 
a  spectral  representation  of  p(v). 


1).  Sample  Correlation  Function.  The  sample  correlation  function 
P (v)  is  computed  (using  the  Fast  Fourier  Transform)  by 


p(v) 


1  1  k  ~  k 

Q  e-p<2"V)  f(Q>  • 


which  holds  for  0  <  v  <  Q-T 
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K •  Sample  Spectral  Distribution  Function. 

X 

F(X)  =  2  /  f  ( A ' )  dX’  ,  0  £  X  <  0.5  ; 

0 

the  graph  of  F(X)  provides  qualitative  diagnostics  of  the  time 
series  model  type. 

The  foregoing  basic  statistics  are  the  building  blocks  of 
the  smooth  spectral  estimators  whose  theory  is  discussed  in  the 
rest  of  this  paper. 


II.  ENTROPY  DISTANCES,  AUTOREGRESSIVE  SPECTRAL  ESTIMATORS,  AND 

LOG  SPECTRAL  ESTIMATORS 

The  theory  of  statistical  spectral  analysis  in  1980  should 
be  based,  in  my  opinion,  on  the  role  in  statistical  inference  of 
entropy  and  information  numbers.  The  credit  for  emphasizing 
this  perspective  should  be  given  to  the  two  pioneering  develop¬ 
ments  of  MEM  (maximum  entropy  method)  of  Burg  (1967)  and  AIC 
(information  criterion)  of  Akaike  (1974). 

Given  a  sample  Y(t),  t  =  1,  2,  ...,  T  of  a  discrete  para¬ 
meter  time  series  Y(t),  t  =  0,+  1,  ...,  the  general  problem  of 
statistical  inference  is  to  infer  the  probability  distribution 
of  the  observed  random  variables.  A  probability  model  whose 
goodness  of  fit  to  the  data  is  an  ever-present  hypothesis  is 
that  Y(t),  t  =  0,  +  1,  ...  is  a  zero  mean  Gaussian  stationary 
time  series  with  covariance  function  R(v)  =  E[Y(t)  Y(t+v)]  , 
v  =  0,  +  ,  ....  and  correlation  function  p(v)  =  R(v)/R(0)  . 

When  discussing  statistical  inference,  it  is  usual  to  assume  that 
the  process  is  ergodic  which  requires  us  to  make  an  assumption 
such  as  R(v)  is  absolutely  summable:  E|R(v)|  *  m  .  The  power 
spectrum  S(A)  and  spectral  density  function  f(A)  are  defined  (in 
Section  1.3)  as  the  Fourier  transforms  of  R(v)  and  p(v)  respec¬ 
tively. 

1.  APPROXIMATE  LIKELIHOOD  FUNCTION  OF  STATIONARY  GAUSSIAN  TIME 
SERIES 

One  approach  to  forming  optimal  estimators  of  statistical 
parameters  is  to  obtain  a  formula  for  the  likelihood  or  joint 
probability  density  function  of  Y(l),  ...,  Y (T) ,  which  we  denote 
by  fg  (Y(l) ,  ...,  Y(T) )  ;  the  subscript  0  indicates  that  it  is  a 
function  of  the  unknown  parameters  9  ,  log  is  natural  logarithm, 

*  is  complex  conjugate  transpose.  Then 

-21og  f  (Y(l),  ....  Y(T) )  =  log{(2Tt)T  det  Kn}  +  Y.V^Y 

u  o  r  o  t 
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where  Y^  =  (Y(l),  ....  Y(T))  and  Kq  =  EY^Y  is  a  covariance 
matrix  with  (s,t)  -  element  equal  to  Rg(s-t)  .  The  subscript 
0  on  Rg(v),  Pq(v) ,  Sg(A),  and  f q < X) ,  indicate  that  they  are 
functions  of  unknown  parameters  8  (which  are  to  be  estimated). 

The  covariance  matrix  K  is  a  Toeplitz  matrix;  asymptotically, 
as  T  tends  to  °°,  all  T  by  T  Toeplitz  matrices  have  the  same 
eigenvectors  exp(-2TTit  j/T)  ,  j  =0,  1,  ...»  T-l  .  The  eigen¬ 
values  of  are  Sg(j/T)  . 

We  prefer  to  express  the  likelihood  in  terms  of  fg(j/T)  . 
Therefore^we  assume  that  the  time  series  Y(t)  has  been  divided 
by  {R(0)}2  so  that  it  can  be  considered  to  have  variance  1,  and 
its  covariance  function  equals  its  correlation  function.  Then 
one  can  show  that  approximately,  for  large  values  of  T, 


-|  log  f  (Y(1),...,Y(T)  =  log2TT  +  /  {logf„ (A)  +  |^7TT>dA 


fe(A) 


-  log2ir  +  H(f;fe) 


T  T 

where  f < A )  =  }  Y(t)  exp-2nitA  t  £  Y  it) 


is  the  sample  spectral  density,  and  the 
defined  by 


number  H  is 


H(f;g)  =  /  {log  g (A)  +  dA  . 


MINIMUM  ENTROPY  DISTANCE  ESTIMATION 


The  maximum  likelihood  estimator  9  is  equivalent  to  the 
estimator  0  minimizing  over  0 


H(f;f  )  =  /  {log  fn(A)  +7itrJ  dA  . 
-0.5  r0U' 


In  order  to  regard  H(f;fg)  as  a  measure  of  "distance"  or  "fit" 
between  the  data  (with  representing  function  f(A))  and  the  model 
(with  representing  function  fg(A)),  we  define  the  entropy 
distance 
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I(f;g)  =  /°^5  {^(iy  -  lo4(x!"  ■  lldX  =  H(f;8)  •  H(f;f) 

Since  u  -  log  u  -  1  •  0  for  all  u,  I  has  two  of  the  properties 
of  a  distance,  namely  T(f;g)  >  0,  I(f;f)  =  0  However  I  does 
not  satisfy  the  triangle  inequality.  Since 

I(f;f0)  =  H(f;f0)  -  H(f;f)  , 

minimizing  ll(f;f0)  with  respect  to  0  is  equivalent  to  minimizing 
I(f;fg).  Minimum  entropy  distance  estimators  0  are  sliown  to  be 
consistent  (as  the  sample  size  T  tends  to  infinity)  by  showing 
that  the  sequence  I(f;f“)  converges  to  zero,  where  f  is  the  true 
spectral  density  function.  If  f  =  f0  for  some  0^  ,  then  one 

can  infer  that  the  sequence  9  converges  to  0^. 


3.  L2  distances  between  spectral  densities 

One  can  relate  entropy  distance  to  the  L2  log  spectral  density 
distance 

0.5 

L  L(f,g)  =  /  {log  f (A)  -  log  g(A)r  dX  . 

-0.5 

2 

Since  u  =  exp  (log  u)  =  1  +  log  u  +  1/2  (log  u)  ,  for  "neighboring" 
f  and  g,  I(f,g)  =  L(f,g)/2  and  minimizing  I(f;f  )  could  be  re¬ 
garded  as  asymptotically  equivalent  to  minimizing  L2L(f;f0)  . 

An  extensive  discussion  of  these  distances  is  given  by  Gray,  Buzo, 
Gray,  and  Matsuyama  (1980). 

The  notation  L2L  is  cnosen  to  emphasize  the  distinction 
between  that  distance  and  the  L2  spectral  density  distance 

0.5  2 

L,(f,L)  =  /  {f(X)  -  f  (X)}  dX  . 

2  *  -0.5  9 


This  distance  has  been  used  for  spectral  estimation  but  it  seems 
not  to  be  justifiable  in  general. 


However  in 
densities,  when 


the  case  of  smoothing  prewhiten^d  sample  spectral 
may  be  expected  to  have  a  small  log-range, 
It  then  may  approximate 


f(,(X) 

L2(f,f0)  may  be  a  justifiable  distance. 


r0. 5  ff(X)  -  f  (Xfl 

— vrr-J 


dX 
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which  is  also  a  useful  "distance". 


4.  MINIMUM  DISTANCE  FORMULATION  OF  OPTIMAL  ESTIMATION 

In  summary,  one  approach  to  forming  "optimal"  estimators 
f(A)  of  the  spectral  density  f(A)  of  a  stationary  time  series  is 
to  view  ?(A)  as  a  function  closest  to  f(A)  in  a  "distance"  between 
spectral  density  functions,  such  as 

H(f;f)  =  flog  f (A)  +  l^xyjdA 

l(f;f)  =  /°q55  -  log  ll^-ljdA  =  H ( f ; f )  -  H(f;f) 

L2L(f,f)  =  /^55{log  f  (A)  -  log  f  (A)  }2  dA  . 

The  class  of  functions  from  which  f(A)  is  chosen  can  be  specified 
or  constrained  either  parametrically  or  non-parametrically .  A 
parametric  constraint  is  to  choose  f(A)  from  a  family  of  functions 
f0(A)  indexed  by  a  finite  number  of  parameters  0.  A  non-parametric 
constraint  is  to  impose  a  smoothness  measure  on  f  such  as  the 
square  integral  of  second  derivatives: 

/^5|f"(A)|2  dA  or  /°of5|(log  f(X))"|  2dA  . 

One  then  seeks  to  choose  f  to  maximize  smoothness  while  minimizing 
a  measure  of  distance  of  f  from  ?  . 

Nonparametric  approaches  to  spectral  estimation  may  work  best 
for  estimation  of  the  log  spectral  density  using  an  approach  in¬ 
troduced  by  Wahba  (1980).  Motivated  by  the  estimation  distance 

f^Uog  f  ( A)  -  log  f(A))2dA  +  K/°^5|(log  f  (A) ) "  | 2  dA 

where  K  is  a  penalty  parameter  to  be  determined  adaptively  by  the 
data,  she  considers  estimators  of  the  form 

00 

log  f (A)  =  l  w(^)  y (v)  exp  (-2itivA) 

y=—oo 

where  y(v),  which  I  call  cepstral-cor relations,  are  defined  by 

Y (v)  =  J^q55  log  f(D  exp  (2irivA)  dA  , 

and  the  weights  w (v)  are  of  the  form 

w(v)  =  — ,  r  =  2  or  4  . 

1+v  r 

We  call  M  the  "half-power"  lag.  Tn  Section  7  we  discuss  how  one 
might  choose  M  and  r  to  minimize  an  estimator  of 
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J(f,f)  =  E  /°‘55  (log  f (A)  -  log  f ( A)  } 2  dX  , 

assuming  log  £(\)  has  finite  range  and  therefore  has  a  represen¬ 
tation 

00 

log  f(X)  =  £  y(v)  exp  (-2nivX)  . 


5.  PARAMETRIC  SPECTRAL  ESTIMATORS,  BIAS,  AND  VARIANCE 


A  spectral  density  estimator  is  called  parametric  if  it  is 
based  on  a  representation  of  the  spectral  density  as  a  function 
of  m  parameters  0,,  ...,  0  ,  which  we  denote  f  .  (X)  . 

i  m 

We  call  m  the  order,  and  it  is  also  often  a  parameter  to  be  esti¬ 
mated.  The  problem  of  model  identification,  or  model  approximation, 
is  to  estimate  m,  and  also  to  estimate  which  parameters  0^,  ...»  0m 
are  "significantly"  different  from  zero. 

The  true  spectral  density  is  denoted  by  f  or  f  .  A  best  ap¬ 
proximation  f  =  f-  -  can  be  determined  for  each  order 

U  ,  ,  a  •  •  1  O 

1  m 

m  where  0,,  ...,  6  minimizes  H(f;f.  .  ).  An  estimator 

1  m  vi »  • •  • » 

1  m 

of  f  is  f  =  f«  -  where  6.,  ...»  0  minimizes  H(f;f  , ...,  )■ 

U  .«•••»  X-  X  m  u  ■■  w 

1  m  1  m 

The  optimal  estimator  f  minimizes  R(f)  =  EKf^jf)  . 

When  using  approximating  parametric  densities  the  criterion  R(f) 

is  replaced  by  an  order  determining  criterion  C(m)  to  determine 

the  order  m  of  the  parametric  density.  One  can  write 

C(m)  =  B(m)  +  V(m,T)  , 

where  B(m)  =  I(fro;f)  ,  V(m,T)  =  EI(f;f)  . 

We  call  B(m)  tlie  model  approximation  error  (or  bias)  and  V(m,T) 
the  parameter  estimation  error  (or  variance).  As  m  — *  ®>  , 

B(m)  — }  0  and  V(m,T)  — »  °°  .  Consequently  C(m)  has  a  minimum. 

AIC  introduced  by  Akaike  (1974)  may  be  regarded  as  corresponding 
to 


B(m) 


H(f;f§ 


1’ 


§  }  =  log  5m 

m 


lo8  » 


V(m,T)  =  2m/T 


Other  order  determining  criteria  may  be  regarded  as  corresponding 
to  different  formulas  for  V(m,T): 

V(m,T)  =  (m/T)  log  log  T,  Hannan  and  Quinn  (1979); 

V(m,T)  =  (m/T)  log  T,  Schwarz  (1978)  . 

CAT(criterion  autoregressive  transfer  function)  is  an  order 


determining  criterion  for  autoregressive  spectral  estimators 
introduced  by  Parzen  (1974),  (1977);  one  version  is 

-2  8  -2  *-2  _ 


1  m 

CAT(m)=i  l 


8  -2  *-2  ,, 

T  jij  “j  '  -  •  "j  ‘  (1-T> 


j 
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We  would  like  to  emphasize  that  it  is  also  of  the  general  form 
B(m)  +  V(m,T)  ,  defining 

-2  „/  'v  \  1  V  -2 

,  V (m, F)  =  -  -  )  a  . 


B(m)  =  -<>  ^  +  () 
m 


tj=i  J 


6.  AUTOREGRESSIVE  SPECTRAL  ESTIMATORS 


The  most  convenient  parametric  estimators  are  autoregressive 
spectral  estimators  of  the  form 


,  /  \  \  2i  -2niX  -2itiXm|-2 

fa  (X)  =  a  1  +  a, e  +  ...  +  a  e 

»,m  1  m 

2 

The  parameters  are  a  ,  a  ,  . . . ,  o  as  well  as  the  order  m.  The 
subscript  0,m  is  merely  symbolic  ?o  indicate  that  f ( X )  is  a 
function  of  m  parameters  (in  addition  to  o  )  . 

Estimators  of  these  parameters  can  be  found  by  solving  "normal 


equations" 


l  a.  K(j,k)  =  0  ,  j  =  1 . 


k=0 


l  a. K(0,k)  =  «  . 


k=0 


where  K(j,k)  is  an  estimator  of  Kvj.k)  =  E[Y(t-j)  Y(t-k)]  .  The 
normal  equations  are  called  stationary  if  K(j,k)  is  chosen  to  be 
a  function  of  (j-k)  . 

Stationary  estimators  a. ,  ....  a  may  be  found  by  minimizing 

I  m 


\  r0.5  i.  ,  -2iriX  ,  -2nimX|2?,  . 

r(a^,...,am)  =  j_Q  ^  |1  +  a^e  +. . .+  a^e  |  f(X)dX  , 


m 


2  1' 

since  H(f ; fn)  =  log  o  +  —  J(a1,...,a). 

o  4  i  m 


We  hayenused  the  important  fact  that  log  1 1  +  a.e  ^  ^X +. .  . 

_‘”27TiinA  |  2  , .  _  n  _ . _ .4 _ _ .  j _ *.1 _ — 


+  a  e  . |  dA  =  0  under  the  assumption  that  the  characteristic 

polynomial  g  (z)  =  1  +  a-z  +  ...  +  a  z  has  all  its  roots  in  the 


complex  z-plane  outside  £he  unit  circle. 


Differentiating  H(f;f  )  with  respect  to  o  one  obtains 

-2  -  -  0 

o  =  J(u^,  ...»  u^)  . 


The  problem  of  minimizing  J(a^>  •••>  a^)  can  be  viewed  as  a 
problem  of  projection  in  the  Hilbert  space  of  functions  on  the 


unit  circle  with  the  inner  product 

<■0.5  ,  2uiX.  ,  .  2niX.  ,* 


,  .  CU.3  ,  ZTI1A.  ,  .  Z7I1A. 

-  =  J- o  5  gl(e  >  *82  6  ^  dX 


f 


J(u  ,  ...,  n  )  is  the  norm  squared  of  the  best  approximation  of  1 
by  a  linear  combination  of  . p2nimX  The  coefficient 


»  e 


The  coefficients 
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n 


,  ,  ....  n  are  determined  by  the  condition  that  g  (z)  =  1  +  a,z 

A...  +  jV,  «  -  e2'l»  •  -  •  >  -  1 

m 

Thus 


is  orthogonal  to  zJ  ,  j  =  1,  ....  m  . 

-  ru.3  -  ,  ZTilA.  —  2  it  1 A  i  ,, 

0  =  •'-0.5  8m(e  )  6  f(X)  dX 


=  y  «tp(k-j)  ,  j  =  1,  ....  m,  where  uQ  =  1  . 


k=0 


These  are  the  celebrated  sample  Yule-Walker  equations ,  or  Toeplitz 
normal  equations  for  the  autoregressive  coefficients.  The  estimator 
of“o2  is  o2  ,  called  the  residual  variance  or  prediction  error 


variance,  given  by 


-2 


=  (1,8m)~  =  l  °k  p(k)  • 

m  f  k=0  K 


It  cannot  be  too  strongly  emphasized  that  there  are  several 
ways  to  form  estimators  of  parameters  to  form  an  autoregressive 
spectral  density 

2iTiXm  i-2 


i  / 1 \  -2 | ,  -  2niX 

f(X)=al+a,  e  + 
m  m1  1 


+  a  e 

m 


Various  approaches  are  outlined  in  section  9.  When  the  coefficients 
are  computed  by  the  Yule-Walker  equations  f  is  called  the  Yule- 


Walker  autoregressive  spectral  estimator,  and  it  satisfies 


’  /-o55(1°e  '„<»>  +r7i)> d>  ■  ^  % 

m 


since  ^°8  ^m^X^  dX  =  ^°8  °i 

2  1 


-2 

m 


f0.5  f(X)  ,,  ~  2 1 , - 

f-0.5  TTxT  dX  ^  8m  1  1  f 

m 


?  (1,8)  =  1 

a2  ra  f 
m 


Akaike’s  AIC  (to  be  minimized  to  determine  significant 
orders  m)  is 


2m 


AIC(m)  =  B (m)  +  V(m,T)  =  H(?;f  )  +  _ 

m  T 


-2  2m 
log  +  T 


An  important  consequence  of  our  derivation  is  that  one  can 
evaluate  a  similar  criterion  for  other  ways  of  computing  an  auto¬ 
regressive  spectral  estimator  f  (X)  ; 

*  o  *2 

f  0.5  ?(X)  m  ,  -  „  N 

I  -o.5  FTxT  dX  =  oT  (-  1  •  usually) 

m  m 


defining 


*2 


_  f0-5 
m  '-0.5 

Consequently  an  order  determining  criterion  could  be 


,  ,  -  2tt iX  ,  .  -  2TiiXmi2;,  .  , 

1  +  a,e  +. . .+  a  e  f (X)  dX  . 

1  m 
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H(f;f  )  +  ^ 
m  T 


o  * 
m 


+  log  o  + 


2m 

T 


if  the  criterion  one  uses  for  the  match  of  model  to  data  is  one 
based  on  the  spectral  matching  of  f  to  f  (although  f  is  estimated 
using  non-stat ionary  autoregressive  models). 


7.  LOG  SPECTRAL  SMOOTHING  AND  CEPSTRAL  CORRELATIONS 


The  problem  of  estimation  of  the  spectral  density  of  a  time 
series  Y(.)  can  be  regarded  in  theory  as  determining  a  smooth 
function  f  (A)  which  optimally  fits  a  sample  spectral  density 
f y ( A )  .  (Note  that  to  compute  fy(A)  one  may  have  used  a  data 
window).  We  believe  that  the  best  fit  is  often  obtained  by  an 
iterated  spectral  estimator  which  uses  an  autoregressive  estimator 
to  match  the  large  scale  excursions  of  fy(A)  ,  and  then  uses  log 
spectral  smoothing  to  match  the  smaller  excursions.  The  autore¬ 
gressive  filter  often  has  the  effect  of  reducing  the  log-range  of 
the  spectrum,  without  following  fine  structure  which  is  present. 

The  fine  structure  which  is  left  in  the  residual  process  is  esti¬ 
mated  by  the  log  spectral  smoothing  estimator. 

For  long  memory  time  series,  the  iterated  spectral  estimator 
combines  (1)  an  order  1  or  2  autoregression  to  transform  to  a 
short  memory  time  series,  (2)  an  autoregression  to  prewhiten, 

(3)  log-spectral  smoothing. 

Autoregressive  spectral  estimation  phase.  Using  an  order 
determining  criterion,  and  either  stationary  or  non-stat ionary 
estimators  of  coefficients,  one  determines  an  autoregressive  fil¬ 
ter  g  (L)  ,  autoregressive  residual  variance  f  and  autoregressive 
m  ,  .  .  in 

spectral  density  estimator 


f  (*)  = 

m  m 


I  *  .  2uiA. i-2 

lgm(e 


The  residual  time  series  Y(t)  is  defined  by 


Y(t)  =  g  (L)  Y(t)  . 
m 

Autoregressive  spectral  estimators  are  superior  to  other  spectral 
estimators  when  the  length  of  the  observed  segment  of  a  time 
scries  is  short  compared  to  the  (long)  memory  of  the  correlation 
function  of  tin*  time  series. 

If  Y  were  regarded  as  white  noise,  one  would  regard  (^(A) 
as  the  estimated  spectral  density  of  the  time  series.  To  compen¬ 
sate  for  the  fact  that  Y  may  not  be  white  noise,  and  to  ease  the 
burden  of  requiring  Y(t)  to  be  white  noise,  we  estimate  its  spectral 
density  _ 

Residual  log  spectral  estimation  phase.  Between  the  sample 
spectral  densities  of  Y(t)  and  Y(t)  there  exists  a  basic  relation: 

fY(X)=°m  lgm(e  fY(X) 
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where 


fy(A)  dA 


This  relation  can  be  written: 


fyU) 


fy(M 


lOg  fy  (X  )  =  lOg 

m 


log  fm(A)  +  log  fy(A)  . 


Assuming  that  fy(A)  has  been  "prewhitened"  in  the  sense  of 
having  moderate  log  range,  we  smooth  log  fy(A)  to  form  an  estimator 
{log  f  ~  (  A) }*  .  Then  as  a  final  estimator  of  the  true  log  spectral 
density  f(A)  we  take,  up  to  a  normalizing  constant, 


{log  fy ( A) }  =  {log  fy (A) )  +  log  fm(A) 

To  smooth  a  log  spectral  density,  compute  cepstral  correlations 

-  1  k  k 

Y(v)  =  -  l  exp  (-2uiv— )  log  fy  (-)  , 

^  k =0  4  W 


for  v  =  0,  1,  ...,  T.  Define,  following  Wahba  (1980), 

{log  f o ( A )  } "  =  .57721  +  I  exp  (27iivA)  y(v)  - 

| v | <T  1 


1  +  (v/M)' 


where  r  is  an  integer  2  ;  usually  one  takes  r  =  4  or  r  =  2  , 
and  M  is  a  real  number  chosen  in  practice  to  be  an  integer  satis¬ 
fying  2  <  M  <  12  .  One  calls  M  the  half  power  point  of  the  estimate. 
To  introduce  a  criterion  for  the  choice  of  M,  define  g(^) 

.  r-  \  \  t  ?  /  ■.  \  /  _  \  fO.  5  _  /  n _ !  1  / 1  \  J 1 


=  log  f^(A)  ,  g (A )  =  log  fy  (A)  ,  y  (v)  =  !_q  5  exp  (2TriAv)g(A)dA , 

£w(A)  =  {log  fr>(A)}"  defined  above,  a  measure  of  the  goodness  of 
M  .  ,Y 

an  estimator  is 


Rn  =  E  /J|gM(X)  -  g(A)|2dA  =  ±  ^  E{gM(>j)  -  g ( j ) ) 2 

Following  Wahba  (1980),  to  minimize  one  minimizes  RH 
+  V(M,T)  ,  defining 


B(M)  = 


I  | Y(v) I  2  v4r 


a  +  $2rr2 


n  tt  ,  r™  ,,  ,  2r.-l  , 

V (M,T)  =  ^  -g-  4J  (1  +  u  )  du  . 

One  evalues  R^  for  various  values  of  M  (and  r);  one  chooses  for 
these  parameters  the  values  minimizing  .  The  iterated  spectral 
estimator  is  data  adaptive,  since  the  parameters  m  and  M  required 
to  compute  the  estimator  are  chosen  adaptively  through  order-deter- 


mining  (or  model  selection)  criteria. 


8.  MAXIMUM  ENTROPY  SPECTRAL  ESTIMATION 

To  discuss  the  philosophical  basis  of  the  maximum  entropy 
method  of  spectral  estimation  introduced  by  Burg  (1967),  we 
need  to  discuss  further  the  role  of  information  numbers  in  statis¬ 
tics.  To  a  sample  ( Y ( t )  ,  t  =  1,  ...»  T}  there  is  a  true  pro¬ 
bability  density  f(Y(l),  ....  Y(T))  ;  we  denote  by  f  (Y( 1) , . . . , Y(T) ) 
a  probability  density  function  which  is  a  function  of  parameters 
and  which  represents  a  model  for  the  true  probability  density. 

A  measure  of  the  discrepancy  between  f  and  f(1  is  the  Kullback- 
Liebler  informa t ion  number  or  directed  divergence 


I T ( f ;  f 0  )  =  |  Kf  [log  {-) 


f  (yL . yT) 

•'-'""i  <>, . 


)dy1- • -dyT 


Pinsker  (1963)  shows  that  in  the  limit  as  T 


2iT(f;fe)  - 


f0.5  ff  (X) 
i-O.5\f0(X) 


1  -  log 


ion 

f  e(x); 


dA 


=  H(f ;f0)  -  H(f ; f )  . 


We  can  distinguish  two  ways  to  use  this  formula,  (1)  a  stat¬ 
istical  or  data  analysis  approach,  and  (2)  a  probability  approach. 

A  data  analysis  approach  to  parameter  estimation  is  to  use  a  raw 
estimator  f  of  f  (which,  while  a  wiggly  estimator  of  f,  is  satis¬ 
factory  when  only  used  as  an  integrand)  to  form  an  estimator 
IT(f;f0)  of  lT(f;f0)  . 

In  contrast  to  the  data  based  approach  which  minimizes  H(f;fQ) 
over  0,  is  the  probability  approach  which  maximizes  H(f;f)  over 
all  functions  f  satisfying  a  set  of  constraints 

/°055  *  (X)  f(X)  dX  =  C.  ,  j  =  1,  ....  M 

for  specified  functions  i/>,(X)  .  An  example  of  a  set  of  constraints 
is  to  rec|u i re  the  first  mJcorrelations  of  f(X)  to  equal  sample 
correlations  p(j)  : 


/■0.5  2niX  j 

'-0.5  8 


f (X )  dX  =  p(j) 


0,  +  1, 


+  m  . 


Since  H(f;f)  =  (1  +  log  f(X)}  dX  , 

the  optimal  function  f (x)  is  called  a  maximum  enjrropy  estimator 
of  f ( X )  .  It  is  well  known  that  f(X)  has  the  form  of  an  auto- 
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regressive  spectral  density: 


f(A) 


2 

a 

m 


|1  + 


2niA 


2711X1111-2 

+  a  e 

m  1 


The  maximum  entropy  principle  provides  a  motivation  or  justi¬ 
fication  for  the  use  of  autoregressive  spectral  estimators.  However 
the  maximum  entropy  principle  provides  no  insight  into  iiow  to 
identify  an  optimal  order  m,  or  even  what  are  the. effects  of  dif¬ 
ferent  methods  of  estimating  the  parameters  o^,  a  . 

It  provides  no  guidance  for  how  to  learn  from  the  data  whether 
the  time  series  is  non-stat ionarv  (long  memory)  or  stationary 
(short  memory),  or  whether  the  best  time  series  model  is  AR,  MA, 
or  ARMA.  In  my  view  it  is  a  principle  for  deriving  probability 
models,  rather  than  statistically  fitting  models  to  data. 

It  should  be  realized  that  the  maximum  entropy  principle  justi¬ 
fies  autoregressive  estimators  only  for  short  memory  time  series 
(for  whom  log  f(X)  is  integrable).  Autoregressive  estimators  are 
justified  for  long  memory  time  series  by  the  fact  that  a  pure  har¬ 
monic  Y(t)  =  A  cos  — t  +  B  sin  — t  satisfies  Y(t)-4>Y(t-l)+Y(t-2)=0 

P  P 

where  <j>  =  2  cos  —  . 

P 


A  justification  of  autoregressive  estimators  for  short  memory 
time  series  that  I  prefer  is  the  existence  of  the  infinite  auto¬ 
regressive  scheme  representation  for  a  stationary  time  series  satis¬ 
fying:  spectral  density  f(A)  is  continuous  and  differentiable; 

f(A)  is  bounded  above  and  below;  f'(A)  is  square  integrable. 

Then  f(A)  has  an  infinite  autoregressive  representation 


.  2  |  ,  27iiX.  i-2 


where  g  (z)  =  1  +  a,  z  +  . . .  +  a  zm+ 

°a°  1 ,  co  m , 00 


9.  PARAMETRIZATION  OF  AUTOREGRESSIVE  SPECTRAL  ESTIMATORS 

There  are  many  approaches  for  forming  autoregressive  spectral 
estimators,  because  there  are  four  equivalent  ways  of  parametrizing 
them:  (A)  autoregressive  coefficients,  (B)  correlations,  (C)  par¬ 

tial  correlations,  and  (D)  residual  variances. 

2 

A.  Consider  autoregressive  coefficients  0  <  o  <  1,  a,  , ....  a 

m  -  1  ,m  ’  m,m 

such  thatg(z)  =  1  +  a.  z  +  ...  +  a  zm  satisfies  g(z)  ^  0  for 

X  |  m  m  y  in 

complex  z  such  that  |z|  <1  .  Thus  g(z)  is  a  minimum  phase  filter 
transfer  function.  These  coefficients  define  the  autoregressive 
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spectral  estimator  f  (A) 


I  ,  2  n  i  A  ,  -  2 
'*m  (C  )! 


1$.  Consider  correlation  coefficients  p(l),  (i(2),  o  (m) 

w  1  i i cl i  are  positive  definite.  The  correlation  coefficients  deter¬ 
mine  autoregressive  coefficients  by  solving  the  Yule  Walker 

equation  (with  a_  =  1) 

U  ,m 

m  2 

l  cij  mp(j-k)  =  0,  k  =  1,  .  .  .  ,  m;  =  0m  ,  k  =  0  . 

The  autoregressive  coefficients  determine  the  correlation  coef¬ 
ficients  by 

p(j)  =  /^q^5  exp  ( 2  tt  i  A  j  )  fm(A)  dA  . 

C.  Consider  coefficients  n(l),  II(m)  satisfying 

[  n  ( 1)  I  <■  1,  |ll(m)|  <  1  .  They  represent  partial  correlation 

coefficients  defined  theoretically  by:  n(j)  =  partial  correlation 
between  Y(t)  and  Y(t-j),  conditioned  on  Y(t-l),  Y(t-j+l)  . 

2  2 

D.  Consider  coefficients  a  ,  . a  ,  sign  11(1),  sign  n  (m) 

2  2  1  2  m 

satisfying  1  >  a,  >  o.  >  . . .  >  o  >  0  .  They  represent  residual 
X  Z  2  ^ 

variances  defined  by:  o.  =  mean  square  prediction  error  o£  Y(t) 
given  Y(t-l),  ....  Y(t-])  ,  expressed  in  units  of  E [ | Y ( t ) |  ]  . 


Partial  correlation  coefficients  determine  autoregressive 
coefficients  and  residual  variances  by  the  Levinson  recursion 
(see  Makhoul  (1977)): 

H(k)  , 


k,k 


a . 
J 


a . 


=  a 


j  ,k-l 

2 


-n(k)  a 


k-j,k-l’ 


k-1 


(i  -  n  (k) > 


Residual  variances  determine  partial  correlation  coefficients 
by  a  formula  due  to  Dickenson  (1978) 


n(k)  =  sign  II (k)  / 


1  - 


_k 

2 

Vl 


by 


Autoregressive  coefficients  determine  partial  correlations 
the  recursion  (Barndorf-Nielsen  and  Schon  (1973)) 


“j,k-l  =  ,1_ 


l^(k)  1  ^{«  ,  + 

J  *  K 


l(k)  “k-j,k} 


In  summary,  to  form  f  (A)  one  can  specify  any  one  of  the  four 
paramet rizat ions .  Given  correlations,  to  solve  the  Yule-Walker 
equations  one  has  many  approaches:  (])  SWF.RP,  (2)  Cholesky  decom¬ 
position,  (3)  Lcvinson-Durb in  recursion,  which  computes  partial 
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correlation  coefficients  by 
k-i 

-n(k)  =  l  a  ,  ,  P (k-j ) /  a  , 

j=0  J* 

and  (4)  Levinson-Whittle-Robinson  recursion  which  computes  IT (k) 
using  forward  and  backward  prediction  error  coefficients  (see 
Kailath  (1974)). 

HI.  AN  OUTLINE  OF  EMPIRICAL  SPECTRAL  ANALYSIS 

Successive  stages  of  analysis  whose  outputs  are  combined  to 
form  estimators  of  the  spectrum  of  a  single  time  series  are: 

Data  Transformation  and  Detrending 
Data  Windowing 
Extend  with  Zeroes 
Fourier  Transform 

Average  Short-time  Segment  Spectral  Density  Estimators 
Sample  Spectral  Density,  Sample  Spectral  Distributions 

Spectral  Average  Direct  Spectral  Density  Estimators 
Sample  Correlations 

Indirect  Lag  Window  Spectral  Density  Estimators 
Autoregressive  Coefficients,  Yule  Walker  Equations 
Autoregressive  Spectral  Density  Estimators 
Autoregressive  Order  Determination  AIC  CAT 
Memory  Identification,  ARMA  Identification 

Autoregressive  Coefficients:  Nonstationary  Least  Squares, 

Lattice  Algorithms.  Kalman  Filtering 
Autoregressive  Transformation  of  Y  to  Y 

If  Y  long  memory,  either  seek  Y  short  memory  and  return  to 
data  transformation  stage  or  go  to  long  memory  mixed  or  band- 
limited  methods  listed  below. 

If  Y  short  memory,  seek  whitening  filters 
Log  Spectral  Density  Estimators  of  Y,  via  cepstral  correlations 
Iterated  Adaptive  Spectral  Density  Estimators  of  Y 
Subset  ARMA  Identification 
S-Array  ARMA  Identification 
ARMA  Spectral  Density  Estimator  of  Y 
Other  spectral  analysis  procedures: 

Robust  Autoregressive  Transformation  of  Y  to  Y 
Mixed  Spectral  Estimation  (Long  Memory) 

Bandlimited  Noise  Spectral  Estimation  (Long  Memory) 

New  techniques  under  research: 

Nonparametric  Data  Modeling  of  Sample  Spectral  Density 
Spectral  De-whitening 

A  good  description  of  techniques  for  reliably  estimating  the 
spectrum  is  in  Thomson  (1977).  We  must  conclude  our  outline 
here  due  to  space  limitations. 
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